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SUMMARY 

This paper presents a strategy for the computation of structures with repeated patterns based on 
domain decomposition and block Krylov solvers. It can be seen as a special variant of the FETI 
method. We propose using the presence of repeated domains in the problem to compute the solution 
by minimizing the interface error on several directions simultaneously. The method not only drastically 
decreases the size of the problems to solve but also accelerates the convergence of interface problem 
for nearly no additional computational cost and minimizes expensive memory accesses. The numerical 
performances are illustrated on some thermal and elastic academic problems. Copyright © 2000 John 
Wiley & Sons, Ltd. 
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1. INTRODUCTION 

Industrial structures often exhibit repeated patterns. Indeed the use of repetitions simplifies 
the design and enables to reduce costs in assembled structures because repeated components 
can be mass-produced; in addition symmetries enable to equilibrate inertia and to balance 
loadings. The existence of repetitions is naturally used when defining the CAD model of the 
structure, using "copy-paste" and rigid transformations. Unfortunately, taking into account the 
repetitions during the numerical simulation of the structure is not as an easy task. Strategies 
based on Fourier expansions [3] or homogenization techniques [5j are only valid in a limited 
range of application, most of them require either true geometrical periodicity, repetition of 
one single pattern or large number of repetitions; their application to non-periodic geometries, 
loadings and boundary conditions is not straightforward. 

The crankcase shown in figure [T] and typically used in power stations of EDF (Electricite 
de France) is a good example of the structures we dedicate our method to: it is composed of 
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repeated cooling winglets and disks, and a complex clamping system. The winglets and disks 
are repeated patterns and we will assume that the mesh of every occurrence of a given pattern 
has the same discretization (as is often the case in practice to simplify the mesh generation). 
The remainder of the structure is not a repeated pattern. The winglets and the disk have 
different orientations and thus the structure in its entirety cannot be considered as quasi- 
cyclic. The classical method to simulate the carter would be to obtain a complete mesh of the 
structure and to solve this large system (possibly using a domain decomposition algorithm 
with automatic substructuring software). 



In the present work we propose a numerical strategy based on domain decomposition 
methods [SJ [TTJ 0] (see [5] for a review) and block-Krylov solvers jTOl [TJ [5] . The main idea 
behind the strategy developed here is to redefine the structure as a collection of occurrences 
of patterns as described in figure [2] this structure is made of two patterns, one of which 
occurs three times. Our method exhibits the following properties compared to classical domain 
decomposition methods: 

• only the patterns need to be meshed (not their occurrences), 

• the computational operations are realized more efficiently, 

• the iteration schemes takes advantage of the numerical information associated to all 
occurrences to converge faster. 



The paper is organized as follow. First, basics on domain decomposition and Krylov solvers 




Figure 1: EDF carter (courtesy of O. Boiteau) 




Figure 2: Schematic view of structures with occurrences of a pattern 
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are briefly recalled. Then the method is presented on the most simple case: the linear 
thermal analysis of a fully periodic structure (one repeated pattern). The method is then 
generalized to non-periodic structures, to elastic problems and to decompositions involving 
floating substructures. Eventually, conclusion and prospects are provided. 



2. Basics on domain decomposition methods and Krylov solvers 

The method we propose is a variant of the most classical non-overlapping domain 
decomposition methods [6]. For the sake of simplicity, we restrict our presentation to the 
dual domain decomposition method (a.k.a. FETI method [5J [7]) in the context of the finite 
element analysis of a linear elasticity problem. 

Let us denote K n the stiffness matrix, vP the displacement field and f n the generalized 
forces set on domain ft. The so called "global" (or "assembled") problem to be solved is 

K n u n = jn ^ 

We consider a partition of Q into n non-overlapping subdomains denoted QS a \ and introduce 
interface internal forces A(/ S J imposed on subdomain fiw by its neighbors. We call the 
trace operator which extracts boundary degrees of freedom from subdomain Q( s \ Equilibrium 
of subdomains, equilibrium of interfaces and connectivity of submeshes read: 

= /« _ ^) T a[ s) for a = 1 ... n 



n 

E 



A (s) t {s) u (a) = q ( 2 ) 



\ { b s) =A {s)T \ fors = l...ra 

where is the signed boolean assembly operator which connects pairwise degrees of freedom 
on the interface. We have chosen to express the reactions on the interface of subdomains from 
one unique interface stress field A insuring automatically the equilibrium of the interface 
reactions. The second equation corresponds to the equality of interface displacements. In the 



6 ■ 



following subscript b stands for boundary and subscript i for internal, so that tWttW = u 

In case the Dirichlet boundary conditions of a subdomain are not sufficient to fix it, we call 
the subdomain a "floating subdomain"; the kernel of the associated stiffness matrix is 
denoted by N^, a pseudo-inverse of is denoted by + , it verifies y — y for 

any y in range(A"( s )). Eliminating the displacements from the formulation ([2]), one finds 
the dual interface problem as appearing in the Finite Element Tearing and Interconnecting 
(FETI) method [5]: 

/ tp n\ I \ \ I j\ 

(3) 




where we introduced the following definitions: 

F = J2 A«tW X-W + tW T i4W T G = (. . . A( s h^ iV« 

S 

d = ^2 AW*Wlf W + /W e T = (. . . /(*) T iV(s) 
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( \ (s) (s) (s)~ * (s) 

We will also denote local Schur complement by S (s > = K\ h — K hi K\ { K ib . It can easily 

be proven that = t^K^ + t^ T and thus F = J2 S A (s) S (s ~> + A (s ^ T . In order to take 

advantage of the additive structure of matrix F, a Krylov iterative solver is chosen to solve the 
dual interface problem ([3]). Our strategy is independent of the Krylov solver. For simplicity 
reasons we choose, as in the basic FETI methods, to use the conjugate gradient. We hence 
suppose that F is symmetric positive definite (properties which are directly inherited from 
matrix K n ). 

One key point of coupling domain decomposition methods and Krylov iterative solvers is the 
preconditioner F . Various versions have been proposed for the FETI method: the so called 
"Dirichlct" , "lumped" , "superlumped" and "identity " preconditioners (in increasing order of 
efficiency and computational burden): 

p-i = A^A^S^A^A^ F^ 1 = A^A^K^A^A^ (4) 

S S 

F s l = Y, A(S) ^ (S) diag(^V (s)T A (s) Ff 1 = I (5) 

s 

where A( s ) is a diagonal scaling operator (^ s A^ — I). 

Another keypoint in solving the dual interface problem is the coarse problem imposing at 
every iteration the self-equilibrium constraint of the interface forces, the second set of equations 
in ([3]). The coarse problem is solved using a projector Q and initialization Ao, defined by 

Q = I-F- 1 g(g t F- 1 G^ Cf (6) 
-l 



A = F~ 1 G [G T F~ 1 G) e (7) 



where F -1 can be any of the preconditioners introduced above. 

The standard FETI solver is summarized in the left column of table |U 

Conjugate gradient consists in building a i^-orthogonal basis (wi)o^i^m of Krylov subspace 

K-m (zo, QF~ 1 Q T F^j — span (^zq, . . . , (qP~ 1 Q t F^j z^j and finding approximation A m G 

Ao + K-m f^o, QF~ 1 Q T F^j so that at each iteration the error ||A — A m ||F is minimized (which 
is equivalent to making the residual r m = (d — FX m ) orthogonal to /C m ). Because of the 
good conjugation properties of basis (wi) the optimization is decoupled and only one scalar 
coefficient a m is sought for at each iteration so that ||A — A m _i — a m w m \\F is minimized. 



3. Modification of the FETI method for periodic structures 

For pedagogical reasons the method is first presented in the simplest case, namely a structure 
made of a single pattern repeated in a periodic manner; more complex situations are considered 
in the section following this one. 

3.1. Principles of the method 

In this toy problem, we consider a periodic structure. In order to introduce concepts without 
excessive technical difficulties we first focus on scalar problems so that no local frame is 
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Table I: Dual Schur complement with conjugate gradient 





Classical version 


Multivector version 


1 


Initialize with arbitrary Aoo 


2 


Compute A = F^GiG 7 'F^G^e + QA 00 


3 


Compute r — Q T (d - FX Q ) 


4 


Zq = QF 1 r set w = z 


z o — QF 1 r , o set wo — zq obtain wo 
through permutations 


5 


for j = 0, . . . , m 


5.1 


Pj = Q T Fwj 


pj = Q T Fwj 


5.2 




QLj = {wj T P J )~ 1 {w 3 Tr j) 


5.3 


x i ■ i = -V. + "/"'./ 


Aj+i = A, + wjatj 


5.4 


= r 3 - a iPj 


r j+1 = r.j - pjctj 


5.5 


Zj+i = QF- x r j+ i 


Zj+i = QF~ 1 rj + i obtain Zj+\ through 
permutations 


5.6 


= -(pJ z 3 + l)/(wJPj) 


= -(Wj T Pj)~ 1 (Pj Zj + l) 


5.7 




w j+1 = z j+ i + Wj(3_^ 



required. All substructures thus have the same local Schur complement — S®. To illustrate 
our discussion, one can think of the toy problem of determining the temperature field on a ring 
whose internal face is submitted to a given temperature (so that no zero energy mode exist in 
our problem), the external face being submitted to non-periodic heating. To fix the ideas, we 
assume the pattern to be a third of the structure, see figure [3] 




Figure 3: "Donut" toy problem 

The main assumption underlying the method is that the numerical information generated 
at the interface of one occurrence is pertinent for all identical occurrences in the system. So, 
given a search direction Wi and instead of "simply" finding optimal length a, a block of search 
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direction Wi is generated by permutation of the information. Referring to figure [3] for the 
definition of interfaces, 











w (2,3) 


W (3A)\ 


Wi = 


U,(2,3) 


— > m = 


w (2,3) 


w (9,l) 


w (l,2) 




^(3,1) J 




^(3,1) 


w (l,2) 


w (2,3) y 



The undertilde notation is used to denote the block of permutations. As can be seen three 
different directions of descent have been constructed by simple permutation of the interface 
partitions of the preconditioned residual Wi. Then optimization can be realized simultaneously 
on these three search directions and an optimal linear combination of these search directions 
(represented by vector a) can be deduced so that CEj+x = %i + U>i&i- Obviously the search 
directions Wi+i for the next iteration have to be made orthogonal to previous. 

Depending on the loading and the structure, the method may converge much faster than 
classical methods. In any case the convergence can not be deteriorated because the initial 
information is still present in the multivector basis. The algorithm is presented in the right 
column of table |U As can be seen it is quite similar to the classical conjugate gradient since 
only a few more operations are involved. The following section discusses how an efficient 
implementation can be obtained from the pattern-based framework. 

It is to be noted that the algorithm presented in the right column of table U applies 
independently of the way the multivector is generated and thus, as shown in section 21 it can 
be applied to non-fully-periodic structures. In the case of truly-periodic structures (namely 
where the interface trace operators A 1 --^ are cyclic) computational costs can be minimized 
when performing products and when orthogonalizing the new research direction with respect 
to previous ones: indeed, because F and F^ 1 themselves exhibit periodicity properties, the 
multivectors pj and Zj+± can also be deduced from one single vector. For instance, it is sufficient 

to compute (line 5.1) pj = Q T Fwj then to deduce pj and Wj; it is also sufficient to compute 

(line 5.5) Zj + \ = F~ 1 rj + i then to make Zj+i orthogonal to previous multivector of search 
directions Wj ((line 5.6) (5. = — (?^j T £j) HpJ z j+i) an d (line 5.7) Wj+i = z^+i +Wj[5?) and to 
finally deduce Wj+i through permutations. 

In the following section we discuss how the computations in the multivector FETI algorithm 
can be organized efficiently for periodic structures. 

3.2. Efficient implementation of the blocked algorithm 

It is first to be noticed that our method introduces nearly no additional computations compared 
to the basic method. The main differences are that dot-products are replaced by matrix- 
products {pj T yij^j an d that the resulting square matrix needs to be factorized. Moreover, giving 
the pattern a predominant role leads to the rewriting of all the operations in a computationally 
more efficient way: the main idea is that operations will be realized simultaneously on blocks 
of vectors (instead of on single vectors) . 

The principle is that the complete interface can be observed from the point of view of the 
pattern with each occurrence of the pattern describing one part of the complete interface. 
Classically the interface vector w is stored as subdomain contributions themselves split 
in interface contributions w^^K Here however the complete interface is stored as a matrix, 
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named W, of local contributions (refer to figure [3]) 

/ w (l,2)\ 



w (2,3) 
,(3T) 



w (l,2)\ / w (l,2) w (2,3) (3,1) 

= W = (9) 

\ w (h3) l w (l,3) (2,1) ^(3,2)' 



where we define tuW) = —w^ ,l \ Clearly W stores the same information as w. The redundancy 
inside W , i.e. storing and tuwVO corresponds simply to storing the information on 

different subdomains as is usually done anyway in parallel implementations when organizing 
the communications. In the following capital letters refer to quantities stored in this specific 
"block" format. 

Let be the Schur complement of the pattern which, in the case of thermal problems, 
is identical for all its occurrences. Computing a product Fw — A^'S^ A^ w can be 

realized by computing + W and using adapted assembly operations. The interest of such 
an implementation is that the solutions for all the right hand sides (i.e. all columns of W) 
are computed simultaneously, which enables to minimize the cost of memory transfers which 
represent a significant contribution to the overall cost in sparse matrix operations. 

Let us now analyse the extra cost associated to the minimization of the residue on a set of 
research directions. The issue of applying F and F" 1 on multivectors was discussed above. One 
must however note that to put the result of the operation in the multivector format as described 
for W in © a slightly modified implementation of the assembly procedure is required. The 
matrix operations such as p T w can directly be deduced from the computation of P T W (using 
again a modified assembly procedure). In this way this operation is hardly more expensive than 
simply computing the vector dot product p T w. Finally the only significant extra-cost is caused 
by the factorization of matrix p T w; since, in practical problems, the number of occurrences of 
a pattern is not extremely high the dimension of this matrix is expected to remain small. The 
issue related to the possibility of this minimization matrix to become singular is discussed in 
following section. 

3.3. Numerical issues 

As said earlier the proposed algorithm optimizes many computations associated to domain 
decomposition methods, and only introduces one additional operation: the factorization of 
matrix (p T w) required during the computation of optimal descent coefficients ctj. It might 
occur that for some iteration this matrix became non-invertible or at least very poorly 
conditioned. The bad conditioning of matrix (p T w) can be traced back the non full-column- 
rankness of the block of search directions w which is itself associated to a redundant information 
between (at least) two interfaces. In that case the use of a pseudo inverse is sufficient to avoid 
breakdowns. 

It has been observed that this algorithm, due to the fact that it handles several directions 
simultaneously, is more sensitive to roundoff errors than classical conjugate gradient schemes. 
For that reason full re-orthogonalization of the directions of descent is often used in order to 
make the iteration procedure more robust. The ^-normalization of search directions is then 
a simplifying feature; after the computation of 77 = (p T w), one substitutes w 4— wr]~^ and 

p <— pr)~i such that (p T w) = I. As a result no system of equation needs to be solved during 
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the full-reorthogonalization and residue minimization steps. Obviously the normalization 
procedure described above is the right place to handle redundancy of directions of descent 
in the multivector. 

3.4- Assessments 

3.4-1- Academical tests Here the Schur complement of the pattern is represented by a random 
dense matrix made symmetric definite positive. The Dirichlet's preconditioner is used. The 
pattern is repeated either five or nine times. The loading is random and non-periodic. Figure 
0] presents the number of iterations of the conjugate gradient to reach a given precision 
for a classical dual domain decomposition method and for the block (multivector) version 
(implemented in Scilab, |www . scilab . org| . As can be seen, using several directions of descent 
per iteration can reduce the number of iteration by about 35% (respectively 55%) for the 
five-repetition example (respectively for the nine-repetition test), reminding that the cost per 
iteration is nearly identical. 



I°9l|r|> lcg(|r|) 




Five repetitions Nine repetitions 

Figure 4: Comparison between new (dashed) and classical (solid) algorithms 

As can be observed in almost every cases, the norms of the residual during the first iterations 
of the block and non-block methods are almost coincident. Indeed, the first iterations of 
conjugate gradient explore the higher part of the active spectrum of the problem which 
corresponds to large wavelength phenomena which are almost identical in any column of the 
multivector. Once these contributions are found, the enrichment of the search space offered by 
multivectors becomes effective and leads to significant improvement of the convergence. 

3-4-2. Donut tests In order to evaluate the performance results on more realistic cases the 
conductivity matrix of a piece of donut is obtained from FreeFem+- f- Qwww. f reef em. org) and 
imported in Scilab. The pattern in these examples has about 2 000 degrees of freedom (dof ) of 
which about 100 are on its interface. Here again the loading is random and non-periodic. The 
Dirichlet preconditioner is used. Figure [5] presents the number of iterations of the conjugate 
gradient to reach a given precision for a classical dual domain decomposition method and for 
the block dual domain decomposition method. The algorithms are assessed on the five and 
nine-part donuts. The decrease of the number of iterations is less spectacular than in previous 
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cases, although with 9 repetitions the block algorithm converges in about 50% less iterations. 



Pattern 
(Five-part donut) 



Five-part donut Nine-part donut 

Figure 5: Comparison between new (dashed) and classical (solid) algorithms 



It is also interesting to estimate the computational costs of the classical FETI solver and 
the block variant for systems with repeated patterns (and applied to periodic structure in this 
section). Here the algorithm is programmed in scilab (an interpreted programming language) 
and cpu time are only indicative. Nevertheless it is informative to compare the CPU time for 
three different variants of the FETI solver: 

• A basic ( "classical" ) FETI method where the fact that the structure is made of repeated 
patterns is not at all accounted for. 

• The same classical FETI algorithm but where the periodicity is utilized to compute 
efficiently Fw = £ s A^S^ w. Since every subdomain has the same local operator 
all subdomain contributions in this interface problem are computed simultaneously, 
namely applying the forward and backward substitution at once for several right hand 
sides. Clearly this algorithm will exhibit the same convergence history but the cost related 
to the factorization of is paid only once, and the cost per iteration is significantly 
reduced. This version of the FETI method will be called the classical FETI-mrhs. 

• Finally we apply the multivector FETI algorithm as proposed in this paper where several 
directions of descent are considered per iteration whereas the cost per iteration is similar 
to the FETI method denoted above as the classical FETI-mrhs. 

The cpu times are listed in Table [TTJ Obviously the performance of the classical FETI is 
significantly penalized by the fact that no use is made of the property that local operators 
are identical; nevertheless if implemented on a multiprocessor computer and assuming perfect 
parallelism (i.e. dividing the cpu time for the classical FETI by the number of domains) the 
CPU time would be very close to the one of the classical FETI-mrhs. Hence in a parallel 
computing setting using the fact that local operators are identical as in the classical FETI- 
mrhs approach is not very advantageous. When applying the multivector FETI however faster 
computing will be achieved thanks to the fact that the entire information generated at every 
iteration is exploited in the multiple search directions, thereby leading to a faster convergence 
with a similar cost per iteration. It amounts to a gain of 50% in computing time in our test 
for the 9-part donut case. 
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5-part donut 




classical FETI 


classical FETI-mrhs 


multivector FETI 


number of iterations 


5 


5 


4 


CPU time (s) 


31.04 


6.41 


6.37 


9-part donut 




classical FETI 


classical FETI-mrhs 


multivector FETI 


number of iterations 


9 


9 


5 


CPU time (s) 


72.6 


8.49 


4.39 



Table II: Performance of the multivector approach for thermal donut problems 



4. Extensions 

4--1- Non- fully periodic structures 

When a structure can not be decomposed into several occurrences of a single pattern, it 
cannot be treated with the procedure described in the previous session dealing with periodic 
structures. It is to be noticed that boundary conditions and/or connection conditions between 
subdomains can also alter the periodicity. Figure [6] presents different cases where the strategy 
developed in the previous section needs to be slightly modified to still be able to make efficient 
use of the presence of patterns. 




(a) Donut with one stand (b) Donut with two stands (c) Non periodic Dirichlet bc's 



Figure 6: Various extensions 



1. 1.1. Donut with one stand Let us consider a donut with one stand (subdomain 4) connected 
to subdomain 3 (figure (6]a). There are thus two patterns, one of which being instantiated 
three times. The topology of interfaces is modified and the system cannot be seen as periodic 
anymore. Nevertheless one can still build a multivector that makes use of the information 
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computed on the interfaces between repeating subdomains. The simplest multivector one can 
consider is: 









w (2,3) 


U,(3,l)\ 


w (2,3) 


W = 


U,(2,3) 


U,(3,l) 


^(1,2) 


w (3,l) 




W (3,D 


w (l,2) 


„,<2,3) 






^(3,4) 


^,(3,4) 





(10) 



In the present work the method is assessed on academical matrices (randomly generated as 
in the examples of section l3.4.ip . Each interface is of dimension 20 and either 5 or 9 repetitions 
are considered. In figure [7] we plot the convergence history of the classical and the multivector 
FETI. It is observed that the exploitation of the repetition leads to high efficiency, the number 
of iterations being divided by a factor greater than 2. It is also remarkable that the multivector 
method is independent on the number of repetitions. 




Five repetitions Nine repetitions 

Figure 7: Comparison between new (dashed) and classical (solid) algorithms 



It thus seems that the proposed multivector FETI approach leads to convergence efficiency 
that is scalable with the number of repetitions. A possible interpretation of this result is 
that the exploitation of the repetition has the same algorithmic effect as the use of a coarse 
grid problem (typically associated to rigid body motions of floating substructures): at each 
iteration the residual is made orthogonal to a subspace augmented by the permutations (that 
is to say a space that is larger than the single vector of classical conjugate gradient). The 
permutations enable to transmit instantaneously a global information on the whole structure 
which is precisely the aim of coarse grid problems. 
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4-1-2. Donut with two stands In the case of two identical stands (figure [51b) new partial 
periodicity appears. It is possible to consider the following enriched multivector: 









w (2,3) 


W (3A) 


w (l,2) 


U;(2,3) 


U,(3,l)\ 


w (2,3) 




U,(2,3) 


w (3,l) 


^,(1,2) 


U,(2,3) 
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w (3,l) 
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^(3,1) 




w (2,3) 
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U;(l,2) 


^(2,3) 


W (1,S) 




U,(l,5) 


w {lfi) 


^(1,5) 


U,(2,4) 


^(2,4) 


w (2,4) 






^(2,4) 


W (2A) 




w (l,5) 


^(1,5) 


w (l,5) y 



Of course one danger of this philosophy is an exponential increase of components in the 
multivector as the number of patterns and repeated interfaces grows. Proper strategies to 
choose the permutations are yet to be developed in that case. 

4-1.3. Dirichlet he's Let us now consider the case of one piece of the donut having part of 
its outer frontier submitted to Dirichlet conditions (imposed temperature, figure |6]c); in that 
case the Schur complement of the subdomain should be modified. A first option would be 
to consider that the structure is made out of two patterns (the two free pieces versus the 
clamped one). Another option, following the Total FETI method proposed in [3], is to use 
Lagrange multipliers to impose the boundary conditions and thus the Schur complements are 
always associated to free substructures so that all subdomain can be represented by the same 
pattern. 

4-1-4- The preconditioning problem In any of the last two cases we showed that non-periodic 
interface conditions (extended to the boundary of the subdomain) could be integrated into our 
acceleration framework. One key point which gives precedence to the dual approach is that 
the dual Schur complement involves the resolution of Neumann problems associated to matrix 
if( s ) + which is independent on the location of the imposed fluxes, so that a block treatment 
of the Fw product is always possible. 

The problem is different when dealing with the Dirichlet prcconditioncr since it is strongly 
dependent on the location of the Dirichlet conditions (which modifies the splitting between 
interior and boundary degrees of freedom). Thus the theoretically optimal preconditioner is 
made out of contributions which are potentially different even for subdomains associated to 
the same pattern. As an example, consider again the structure with one stand (figure [6]a): the 
Dirichlet operator of subdomain 3 is different from the one of subdomains 1 and 2 because it 
should integrate the action of the stand. Thus a block treatment of the optimal preconditioner 
is complex to manage; anyhow lumped strategies are still available for block treatment and 
should lead to good performance results. 

4-2. Linear elasticity problems 

For now we have considered thermal problems because the research of a scalar unknown leads to 
simpler writing. To use previous strategy for elastic problems one has to take into account the 
rotations which are necessary to transform the pattern to one of its occurrences. These rotations 
modify the interpretation of the degrees of freedom of the interface nodes. Nevertheless this 
additional local rotation step does not modify the strategy proposed here (see for instance [5] 
for a detailed formulation of periodic elastic problems in the dual setting). 
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More precisely two modifications are required: 

• The assembly operator needs to incorporate rotation from local to global frame. Since 
Schur complements are denned in the pattern's frame, not only the assembly operator 
selects degrees of freedom associated to the interfaces of specific occurence but it 
expresses them in the global frame: goes from the structure's frame to the pattern's 
frame and sends back data defined on the pattern to data defined on the structure. 

• The permutations which enable to obtain multivectors (w to w) also need to take into 
account the rotation. Data defined on interface d 1,2 in the structure's frame are not 
directly pertinent for interface d 2,3 , they need to be transformed to be defined in the 
same frame. 

These operations are easy to implement and do barely incur additional computational costs. 

As an example we consider the "fully-periodic" donut as a bidimensional elasticity problem 
(plane strain). The internal ring is clamped and loading is the result of a random process. The 
pattern has about 2000 degrees of freedom (1000 nodes) of which about 100 are on its interface. 
The acceleration due to the use of multivectors is very similar to the one observed for thermal 
problems, performance are summed up in table Hill Here again the use of multivectors seems 
to provide numerical scalability, CPU gain settles around 25% (which could be improved if 
some redundant operations where suppressed in the yet non-optimal implementation of the 
multivector approach). 



5-part donut 




classical FETI-mrhs 


multivector FETI 


number of iterations 


13 


10 


CPU time (s) 


45.25 


34.82 


9-part donut 




classical FETI-mrhs 


multivector FETI 


number of iterations 


15 


10 


CPU time (s) 


43.23 


30.09 



Table III: Performance of the multivector approach for elastic donut problems 



4-3. Floating substructures 

Part of the treatment of rigid body motions is simplified in our framework because it can be 
realized at the scale of the pattern (and not at the level of its occurrences). For instance in 
the fully-periodic elastic donut problem, assuming the pattern is only fixed by a hinge at the 
central node of its internal ring, each occurence has one rigid body motion (rotation around 
the hinge) and the structure has enough Dirichlet boundary conditions (see figure [8J . 
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Figure 8: 

structure 

hinges 



Let i°iV 



3-piece 
with 



5-part donut 




classical FETI-mrhs 


multivector FETI 


nb. iterations 


12 


10 


CPU time (s) 


44.72 


36.23 


9-part donut 




classical FETI-mrhs 


multivector FETI 


nb. iterations 


14 


10 


CPU time (s) 


38.27 


28.28 




Table IV: Elastic donut problems with floating substructures - 
where a and b stand for the two interfaces of the pattern, the matrix 



G of the trace of the rigid body modes in FETI writes (see equation (jjJJ) 



G 



/ G (l,2) G (2,l) \ 

G( 2 < 3 ) G^ 
K G^ G^j 



( ( a ] N a -Of ] N b 
O^N a 







\-0£>N b 



-Of ] N b 



(12) 



where matrix Oa is the rotation matrix which transforms interface a of the pattern into 
interface a of occurence (1). 

The computation of the coarse grid operator (G T F~ 1 G) sums up to the evaluation of the 
reaction of the pattern to traces of rigid body modes on its interfaces. So one just has to 
introduce matrix Gbiock defined in the frame of the pattern: 



Gbio 



ck 




(13) 



and deduce (G T F 1 G) from the computation of G-[ lock (AS^ A T )Gbiock (A is the block 
adaptation of the scaling matrix A^, is the chosen approximation of the primal Schur 
complement of the pattern). Hence the coarse grid problem can be built by computing the 
following responses of the pattern: its response when submitted to a rigid body mode on its 
entire interface and its response to the trace of the rigid modes on each of its interface segment 
successively. 

Results presented in table HVl (obtained with — I) show that rigid body motions have no 
influences on the performance of the multivector technique. Such a result is in good agreement 
with previous analysis: rigid body motions are associated to a large wavelength information 
whereas permutations deal with small wavelength phenomena so that the two effects reinforce 
one another. 
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5. Conclusion and outlook 

This paper presents the first developments around a new approach to compute structures 
made out of repeated patterns. The solution algorithm is based on a nonoverlapping domain 
decomposition method (in order to extract the patterns from the mesh) and on block Krylov 
solvers (in order to distribute time consuming operations and optimize the solution procedure 
on a larger subspace). The method always converges at least as fast as classical method with 
a cost per iteration only slightly higher. We have illustrated the efficiency of the proposed 
strategy on simple examples. 

Few drawbacks need to be investigated yet, mostly the problem of the generation of 
redundant information (which leads to handling too much data and to numerical difficulties). 

The extension of the method to more realistic problems is a rather complex task which 
at first implies an object oriented description of the mesh. The entire system is then based 
on the patterns and connections between their different instantiations. This requires some 
adaptation of usual codes and probably a better cooperation with CAD modelling tools 
(where the information related to the existence of the patterns and their occurences is directly 
available). Moreover, it is important to note that the same way rotations are used to transport 
information in the correct frames, non-boolean assembly operators (as obtained with mortar 
techniques) could be employed to deal with non-matching meshes and transport information 
from one discretization to another, which would avoid to impose too complex contraints on 
the meshing process (patterns could be meshed independently from one another, and faces of 
the same pattern would not need to have the same discretization). 

Nevertheless we believe that handling such a description would lead to high computational 
benefits: only patterns would be meshed (saving memory) and would communicate with one 
another larger amount of data at one time (saving message passing time), operations would 
always be realized on blocks and minimization would be realized on larger subspaces. 
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